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Abstract 

The Voronoi diagram of a finite set of objects is a fundamental geometric structure 
that subdivides the embedding space into regions, each region consisting of the points 
that are closer to a given object than to the others. We may define many variants 
of Voronoi diagrams depending on the class of objects, the distance functions and the 
embedding space. In this paper, we investigate a framework for defining and building 
Voronoi diagrams for a broad class of distance functions called Bregman divergences. 
Bregman divergences include not only the traditional (squared) Euclidean distance but 
also various divergence measures based on entropic functions. Accordingly, Bregman 
Voronoi diagrams allow to define information-theoretic Voronoi diagrams in statistical 
parametric spaces based on the relative entropy of distributions. We define several 
types of Bregman diagrams, establish correspondences between those diagrams (using 
the Legendre transformation), and show how to compute them efficiently We also 
introduce extensions of these diagrams, e.g. /c-order and k-bag Bregman Voronoi dia- 
grams, and introduce Bregman triangulations of a set of points and their connexion with 
Bregman Voronoi diagrams. We show that these triangulations capture many of the 
properties of the celebrated Delaunay triangulation. Finally, we give some applications 
of Bregman Voronoi diagrams which are of interest in the context of computational 
geometry and machine learning. 
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1 Introduction and prior work 

The Voronoi diagram vor(«S) of a set of n points S = {pi, ...,p n } of the <i-dimensional 
Euclidean space IR d is defined as the cell complex whose <i-cells are the Voronoi regions 
{vor(pi)}j 6 {i j .. jn i where vor(pj) is the set of points of IR d closer to p« than to any other point 
of S with respect to a distance function 5: 

vor(pi) = {x G R d | 5( Pi ,x) < 5( Pi ,x) V Pj G S}. 

Points {pi}i are called the Voronoi sites or Voronoi generators. Since its inception in disguise 
by Descartes in the 17th century [5], Voronoi diagrams have found a broad spectrum of 
applications in science. Computational geometers have focused at first on Euclidean Voronoi 
diagrams [3] by considering the case where 5(x, y) is the Euclidean distance ||x — y|| = 

\f^2i=i( x i ~ Vi) 2 - Voronoi diagrams have been later on defined and studied for other distance 
functions, most notably the L\ distance ||x — y||i = Yli=i \ x i ~ Vi\ (Manhattan distance) 
and the distance | |x — y | loo = max ie{i,...,d} \ x i ~ Ui\ 021 E]. Klein further presented 
an abstract framework for describing and computing the fundamental structures of abstract 
Voronoi diagrams [221 HI] • 

In artificial intelligence, machine learning techniques also rely on geometric concepts for 
building classifiers in supervised problems {e.g., linear separators, oblique decision trees, 
etc.) or clustering datam unsupervised settings (e.g., fc-means, support vector clustering [2], 
etc.). However, the considered data sets S and their underlying spaces X are usually not 
metric spaces. The notion of distance between two elements of X needs to be replaced by a 
pseudo- distance that is not necessarily symmetric and may not satisfy the triangle inequality. 
Such a pseudo-distance is also referred to as distortion, (dis) similarity or divergence in the 
literature. For example, in parametric statistical spaces X , a vector point represent a distri- 
bution and its coordinates store the parameters of the associated distribution. A notion of 
"distance" between two such points is then needed to represent the divergence between the 
corresponding distributions. 

Very few works have tackled an in-depth study of Voronoi diagrams and their applications 
for such a kind of statistical spaces. This is all the more important even for ordinary Voronoi 
diagrams as Euclidean point location of sites are usually observed'm noisy environments (e.g., 
imprecise point measures in computer vision experiments), and "noise" is often modeled by 
means of Normal distributions (so-called "Gaussian noise"). To the best of our knowledge, 
statistical Voronoi diagrams have only been considered in a 4-page short paper of Onishi and 
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Figure V. Ordinary Euclidean Voronoi diagram of a given set S of seven sites. In the 
bounded Voronoi cell vor(p 6 ), every point p G vor(p 6 ) is closer to p6 than to any other 
site of S (with respect to the Euclidean distance). Dashed segments denote infinite edges 
delimiting unbounded cells. 

Imai [3l] which relies on Kullback-Leibler divergence of dD multivariate normal distributions 
to study combinatorics of their Voronoi diagrams, and subsequently in a 2-page video paper 
of Sadakane et al. [40J which defines the divergence implied by a convex function and its 
conjugate, and present the Voronoi diagram with flavors of information geometry pQ (see 
also |35j and related short communications [25j EH)- 0m study of Bregman Voronoi diagrams 
generalizes and subsumes these preliminary studies using an easier concept of divergence: 
Bregman divergences [T2| [6] that do not rely explicitly on convex conjugates. Bregman 
divergences encapsulate the squared Euclidean distance and many widely used divergences, 
e.g. the Kullback-Leibler divergence. It should be noticed however that other divergences 
have been defined and studied in the context of Riemannian geometry [lj. Sacrifying for 
some generality, while not very restrictive in practice, allows a much simpler treatment and 
our study of Bregman divergences is elementary and does not rely on Riemannian geometry. 

In this paper, we give a thorough treatment of Bregman Voronoi diagrams which elegantly 
unifies the ordinary Euclidean Voronoi diagram and statistical Voronoi diagrams. Our con- 
tributions are summarized as follows: 

• Since Bregman divergences are not symmetric, we define two types of Bregman Voronoi 
diagrams. One is an affine diagram with convex polyhedral cells while the other one is 
curved. The cells of those two diagrams are in 1-1 correspondence through the Legendre 
transformation. We also introduce a third-type symmetrized Bregman Voronoi diagram. 
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• We present a simple way to compute the Bregman Voronoi diagram of a set of points 
by lifting the points in a higher dimensional space using an extra dimension. This 
mapping leads also to combinatorial bounds on the size of these diagrams. We also 
define weighted Bregman Voronoi diagrams and show that the class of these diagrams is 
identical to the class of affine (or power) diagrams. Special cases of weighted Bregman 
Voronoi diagrams are the fc-order and fc-bag Bregman Voronoi diagrams. 

• We define two triangulations of a set of points. The first one captures some of the 
most important properties of the well-known Delaunay triangulation. The second 
triangulation is called a geodesic Bregman triangulation since its edges are geodesic 
arcs. Differently from the first triangulation, this triangulation is the geometric dual 
of the first-type Bregman Voronoi diagram of its vertices. 

• We give a few applications of Bregman Voronoi diagrams which are of interest in the 
context of computational geometry and machine learning. 

The outline of the paper is as follows: In Section [2j we define Bregman divergences and 
recall some of their basic properties. In Section |3j we study the geometry of Bregman spaces 
and characterize bisectors, balls and geodesies. Section [3] is devoted to Bregman Voronoi 
diagrams and Section[5]to Bregman triangulations. In Section|6j we select of few applications 
of interest in computational geometry and machine learning. Finally, Section [7] concludes 
the paper and mention further ongoing investigations. 

Notations. In the whole paper, X denotes an open convex domain of M. d and F : X \— > R 
a strictly convex and differentiable function. T denotes the graph of F, i.e. the set of points 
(x, z) G X x R where z = F(x). We write x for the point (x, F(x)) G T. VF, V 2 F and 
V _1 F denote respectively the gradient, the Hessian and the inverse gradient of F. 



2 Bregman divergences 



In this section, we recall the definition of Bregman^] divergences and some of their main 
properties (§|2.1 ). We show that the notion of Bregman divergence encapsulates the squared 
Euclidean distance as well as several well-known information-theoretic divergences. We intro- 



duce the notion of dual divergences (£2.2) and show how this comes in handy for symmetriz- 
ing Bregman divergences (§ |2.3[ ). Finally, we prove that the Kullback-Leibler divergence 
of distributions that belong to the exponential family of distributions can be viewed as a 
Bregman divergence 



£2.4) 



Lev M. Bregman historically pioneered this notion in the seminal work [12 on minimization of a convex 



objective function under linear constraints. See http://www.matli.bgu.ac.il/serv/segel/bregman.html 



We gratefully acknowledge him for sending us this historical paper. 
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D F (p\\q) 



Figure 2: Visualizing the Bregman divergence. -Dp(.||q) is the vertical distance between T 
and the hyperplane tangent to T at q. 

2.1 Definition and basic properties 

For any two points p and q of X C R d , the Bregman divergency Dp (-||-) : X \— > R of p to 
q associated to a strictly convex and differentiable function F (called the generator function 
of the divergence) is defined as 



lMp||q) = F(p) 



F(q)-(VF(q),p-q), 



(1) 



where VF = [J^- ... §^] T denotes the gradient operator, and (p, q) the inner (or dot) 
product: J2i=iPiQi- 

Informally speaking, Bregman divergence Dp is the tail of the Taylor expansion of F. See [IE] 
for an axiomatic characterization of Bregman divergences as "permissible" divergences. 

Lemma 1 The Bregman divergence Dp(\)\\(\) is geometrically measured as the vertical dis- 
tance between p and the hyperplane i7 q tangent to T at point q: Di?(p||q) = -F(p) — if q (p). 



Proof: The tangent hyperplane to hypersurface T : z = F(x.) at point q is if q : z = 
F(q) + (VF(q),x - q). It follows that D F (p\\q) = F(p) - tf q (p) (see Figure □ 

We now give some basic properties of Bregman divergences. The first property seems to be 
new. The others are well known. First, observe that, for most functions F, the associated 
Bregman divergence is not symmetric, i.e. -Di?(p||q) 7^ -Dp(q||p) (the symbol || is put to 
emphasize this point, as is standard in information theory). The following lemma proves 
this claim. 



2 See Java™ applet at http://www.csl.sony.co.jp/person/nielsen/BregmanDivergence/ 
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Lemma 2 Let F be properly defined for Dp to exist. Then Dp is symmetric if and only if 
the Hessian V 2 F is constant on X. 

Proof: (=>) From Eq. [TJ the symmetry D F (p\\q) = _Dp(q||p) yields: 

F(p) = F(q) + ^p-q,VF(q) + VF(p)> . (2) 
A Taylor expansion of F around q using the Lagrange form of the remainder also yields: 

F(p) = F(q) + (p-q,VF(q)) + ^(p-q) T V 2 F(q)(p-q) + i(p-q,VF) 3 (r pq )( ; 3) 
with r pq on the line segment pq. Equations ^ and ^ yield the following constraint: 

(p-q,VF(p)> = (p-q,VF(q)) + (p-q) T V 2 F(q)(p-q) + ^(p-q,VF) 3 (r pq )(4) 

On the other hand, if we make the Taylor expansion of VF around q and then multiply 
both sides by p — q, we separately obtain: 

(p - q, VF(p)> = (p - q, VF(q)> + (p - q) T V 2 F(q)(p - q) + I( p - q, VF) 3 (s pq ) , 

with s pq on the line segment pq. However, for this to equal Eq. Q, we must have (p — 
q, VF) 3 (r pq ) = (3/2) (p — q, VF) 3 (s pq ) for each p and q in X. If we pick p and q very close 
to each other, this equality cannot be true, except when the third differentials are all zero 
on r pq and s pq . Repeating this argument over each subset of X having non zero measure, 
we obtain that the third differentials of F must be zero everywhere but on subsets of X 
with zero measure, which implies that the second differentials (the Hessian of F, V 2 F) are 
constant everywhere on X. 

(<^=) Assume the hessian V 2 F is constant on X. In this case, because F is strictly convex, 
the Hessian V 2 F is positive definite, and we can factor it as V 2 F = P _1 DP where D is a 
diagonal matrix and P a unitary rotation matrix. Reasoning in the basis of X formed by 
P, each element x is mapped to Px, and we have -F(x) = ^2 { dixj, where the <ij's are the 
diagonal coefficients of D. The symmetry of Dp is then immediate (i.e., Dp is a generalized 
quadratic distance). □ 

Property 1 (Non-negativity) The strict convexity of generator function F implies that, 
for any p and q in X, Z}p(p||q) > ; with Dp(p\\q) = if and only if p = q. 

Property 2 (Convexity) Function Dp(p\\q) is convex in its first argument p but not nec- 
essarily in its second argument q. 
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Bregman divergences can easily be constructed from simpler ones. For instance, multivariate 
Bregman divergences Dp can be created from univariate generator functions coordinate-wise 
as F(x) = Eti with VF = [g ... g] r 

Because positive linear combinations of strictly convex and differentiable functions are 
strictly convex and differentiable functions, new generator functions (and corresponding 
Bregman divergences) can also be built as positive linear combinations of elementary gen- 
erator functions. This is an important property as it allows to handle mixed data sets of 
heterogenous types in a unified framework. 

Property 3 (Linearity) Bregman divergence is a linear operator, i.e., for any two strictly 
convex and differentiable functions Fi and F 2 defined on X and for any A > 0: 

D F 1+ xF 2 (p\\q) = D Fl (p\\q) + XD F2 (p\\q). 

Property 4 (Invariance under linear transforms) G(x) = F(x) + (a, x) + b, with 
a £ IR d and b £ M., is a strictly convex and differentiable function on X, and Dc(p\\q) = 
D F (p\\q). 

Examples of Bregman divergences are the squared Euclidean distance (obtained for F(x) = 
||x|| 2 and the generalized quadratic distance function F(x) = x T Qx where Q is a positive 
definite matrix. When Q is taken to be the inverse of the variance-covariance matrix, Dp 
is the Mahalanobis distance, extensively used in computer vision. More importantly, the 
notion of Bregman divergence encapsulates various information measures based on entropic 
functions such as the Kullback-Leibler divergence based on the (unnormalized) Shannon 
entropy, or the Itakura-Saito divergence based on Burg entropy (commonly used in sound 
processing). Table [l] lists the main univariate Bregman divergences. 

2.2 Legendre duality 

We now turn to an essential notion of convex analysis: Legendre transform that will allow 
us to associate to any Bregman divergence a dual Bregman divergence. 

Let F be a strictly convex and differentiable real- valued function on X. The Legendre 
transformation makes use of the duality relationship between points and lines to associate 
to F a convex conjugate function F* : W 1 i— > M given by [38J : 

F*(y) = sup{(y,x)-F(x)}. 

The supremum is reached at the unique point where the gradient of G(x) = (y,x) — F(x) 
vanishes or, equivalently, when y = VF(x). 
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Dom. X 


Function F 


Gradient 


Inv. grad. 


Divergence Dp(p\\q) 


R 


Squared function 

a; 2 


2x 


X 

2 


Squared loss (norm) 
(p - q) 2 


M+,q e N 
a > 1 


Norm-like 

x a 


ax a ~ l 


1 


Norm-like 

p a + (q — l)q a — apq a ~ x 


R + 


Unnorm. Shannon entropy 
x log x — X 


log X 


exp(x) 


Kullback-Leibler div. (I-div.) 
plog2 -p + q 


R 


Exponential 
exp x 


exp x 


log X 


Exponential loss 

exp(p) — (p — q + 1) exp(g) 


R+* 


Burg entropy 
— log X 


i 


i 

a; 


Itakura-Saito divergence 
2 - log 2 - 1 


[0,1] 


Bit entropy 

x log x + {1 — x) log(l — x) 




exp a; 

1 1 pvn t* 

_L 1 A J. 


Logistic loss 

plogf + (1 -p) log^f 


R 


Dual bit entropy 
log(l + expx) 


oxp X 
1+exp x 


log t^- 
& 1— a; 


Dual logistic loss 

log 1+ex PP ( v a ) exp<? 

iu & 1+cxp g 1+cxp g 


[-1,1] 


Hellinger-like 


X 

Vl-x 2 


a; 

Vl+a; 2 


Hellinger-like 


-Vl - x 2 





Table 1: Some common univariate Bregman divergences Dp. 

As is well-known, F* is strictly convex. To see this, consider the epigraph epi(F*), i.e. the set 
of points (y,z) such that F*(y) < z. Clearly, (y,z) G epi(F*) iff G x (y) = (y,x) — F(x) < z 
for all x G X. Therefore, epi(F*) = n x6 A , epi(G' x ). Since G x (y) is an affine function, epi(G x ) 
is a half-space and epi(F*) being the intersection of half-spaces is a convex set, which proves 
that F* is convex. The strict convexity follows from the fact that otherwise, F would not be 
differentiate in at least one point z G X: at this point, (y a , z) —F(z) > (y a , x) — F(x), Vx G 
X, and y a = ay\ + (1 — a)y2, Va G [0, 1], yiy2 being a segment on which F* is not strictly 
convex. Thus, yiy2 would be a subdifferential of F in z contradicting the fact that F is 
differentiable. 

For convenience, we write x' = VF(x) (omitting the F in the x' notation as it should be clear 
from the context). Figure [3] gives a geometric interpretation of the Legendre transformation. 
Using this notation, Eq. [T] can be rewritten as 

ZMp||q) = F(p) - F(q) - (q', p - q). (5) 

Since F is a strictly convex and differentiable real- valued function on X, its gradient VF is 
well defined as well as its inverse V _1 F. Writing X' for the gradient space {Vf (x) = x'|x G 
X}, the convex conjugate F* of F is the function: X' C R i— > R defined by 

F*(x') = (x,x')-F(x). (6) 
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Figure 3: Legendre transformation of a strictly convex function F: The ^-intercept 
(0, — F*(y')) of the tangent hyperplane H y : z = (y',x) — F*(y') of T at y defines the 
value of the Legendre transform F* for the dual coordinate y' = V-F(y). Any hyperplane 
passing through an other point of T and parallel to H y necessarily intersects the z-axis above 
-F*{y'). 



Deriving this expression, we get 

(VF*(x'),dx') = (x,dx') + (x',dx) - (VF(x),dx) = (x, dx') = (V^i^x'), dx'), 

from which we deduce that VF* = 'V~ 1 F. From Eq. [6j we also deduce (F*)* = F. 

From the above discussion, it follows that Dp* is a Bregman divergence, which we call the 
Legendre dual divergence of Dp. We have : 



Lemma 3 £> F (p||q) = F(p) + F*(q') - (p, q') = ^.(q'Hp') 



Proof: By Eq. |5j Dp(p||q) = F(p) — F(q) — (p — q, q'), and, according to Eq. [6j we have 
*XP) = (p',p) -^*(p') and F(q) = (q', q) - F*(q'). Hence, D F (p\\q) = (p', p) - F*{p>) - 
(p,q') +F*(q') = ^.(q'Hp') since p = VF~ 1 VF(p) = VF(p'). □ 

Observe that, when Dp is symmetric, Dp* is also symmetric. 

The Legendre transform of the quadratic form -F(x) = ^x T Qx, where Q is a symmetric 
invertible matrix, is F*(y) = ^y T Q _1 y (corresponding divergences Dp and Dp* are both 
generalized quadratic distances). 

To compute F*, we use the fact that VF* = V _1 F and obtain F* as F* = JV~ 1 F. For 
example, the Hellinger-like measure is obtained by setting F(x) = —y/1 — x 2 (see Table TJ. 
The inverse gradient is ^=f and the dual convex conjugate is J = v 1 + x 2 . Inte- 

grating functions symbolically may be difficult or even not possible, and, in some cases, it 
will be required to approximate numerically the inverse gradient V _1 F(x). 
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Let us consider the univariate generator functions defining the divergences of Table [TJ Both 
the squared function F(x) = x 2 and Burg entropy F(x) = — logx are self-dual, i.e. F = F*. 
This is easily seen by noticing that the gradient and inverse gradient are identical (up to 
some constant factor). 

For the exponential function F(x) = expx, we have F*(y) = y\ogy — y (the unnormalized 
Shannon entropy) and for the dual bit entropy F(x) = log(l + expx), we have F*(y) = 
ylogj^ + log(l — y), the bit entropy. Note that the bit entropy function is a particular 
Bregman generator satisfying F(x) = F(l — x). 

2.3 Symmetrized Bregman divergences 

For non-symmetric <i-variate Bregman divergences Dp, we define the symmetrized divergence 

5p(p,q) = S F (q,p) = - (^(pHq) + D F (q\\p)) = ^(p - q, p' - q')- 

An example of such a symmetrized divergence is the symmetric Kullback-Leibler divergence 
(SKL) widely used in computer vision and sound processing (see for example [2S])- 

A key observation is to note that the divergence S F between two points of X can be measured 
as a divergence in X x X 1 C M. 2d . More precisely, let x = [x x'] T be the 2<i-dimensional vector 
obtained by stacking the coordinates of x on top of those of x', the gradient of F at x. We 
have : 

Theorem 1 SV(p, q) = |.Dp(p||q) where F(5t) = F(x) + F*(x') and D F is the Bregman 
divergence defined over X x X' C M. 2d for the generator function F. 

Proof: Using Lemma |3j we have 

Sf(p, q) = \ (D F (p\\q) + D F (q\\p)) = ^ (D F (p\\q) + Dp* (p'| |q')) = 1 -D F (p\\q) 

□ 

It should be noted that x lies on the (i-manifold X = {[x x'] T | x e W 1 } of M. 2d . Note also 
that ^(p, q) is symmetric but not a Bregman divergence in general since X may not be 
convex, while D F is a non symmetric Bregman divergence in X x X' . 

2.4 Exponential families 

2.4.1 Parametric statistical spaces and exponential families 

A statistical space X is an abstract space where coordinates of vector points G X encode 
the parameters of statistical distributions. The dimension d = dim X of the statistical space 



10 



coincides with the finite number of free parameters of the distribution laws. For example, 
the space X — {\p a] T \ (fi, a) G R x IR+} of univariate normal distributions A/"(/i, a) is 
a 2D parametric statistical space, extensively studied in information geometry [1] under 
the auspices of differential geometry. A prominent class of distribution families called the 
exponential families Ep p] admits the same canonical probability distribution function 

p(x\0) = exp{(0, f(x)) - F{0) + C(x)}, (7) 

where f(x) denotes the sufficient statistics and e X represents the natural parameters. 
Space X is thus called the natural parameter space and, since log j x p(x\0)dx = log 1 = 0, we 
have F(0) = log J exp{(0,f(x)) + C(x)}dx. F is called the cumulant function or the log- 
partition function. F fully characterizes the exponential family Ep while term C (x) ensures 
density normalization. (That is, p(x\0) is indeed a probability density function satisfying 
J x p(x\0)dx = 1.) 

When the components of the sufficient statistics are affinely independent, this canonical rep- 
resentation is said to be minimal, and the family E p is called a full exponential family of order 
d = dim A". Moreover, we consider regular exponential families Ep that have their support 
domains topologically open. Regular exponential families include many famous distribution 
laws such as Bernoulli (multinomial), Normal (univariate, multivariate and rectified), Pois- 
son, Laplacian, negative binomial, Rayleigh, Wishart, Dirichlet, and Gamma distributions. 
Table [2] summarizes the various relevant parts of the canonical decompositions of some of 
these usual statistical distributions. Observe that the product of any two distributions of the 
same exponential family is another exponential family distribution that may not have any- 
more a nice parametric form (except for products of normal distribution pdfs that yield again 
normal distribution pdfs). Thus exponential families provide a unified treatment framework 
of common distributions. Note, however, that the uniform distribution does not belong to 
the exponential families. 

2.4.2 Kullback-Leibler divergence of exponential families 

In such statistical spaces X, a basic primitive is to measure the distortion between any two 
distributions. The Kullback-Leibler divergence (also called relative entropy or information 
divergence, /-divergence) is a standard information-theoretic measure between two statistical 

distributions d± and d 2 defined as KL^Hg^) = j x di(x) log ^^dx. This statistical measure 
is not symmetric nor does the triangle inequality holds. 

The link with Bregman divergences comes from the remarkable property that the Kullback- 
Leibler divergence of any two distributions of the same exponential family with respective 
natural parameters p and q is obtained from the Bregman divergence induced by the cumu- 
lant function of that family by swapping arguments. By a slight abuse of notations, we denote 
by KL(0 P | \0 g ) the oriented Kullback-Leibler divergence between the probability density func- 
tions defined by the respective natural parameters, i.e. KL(0 p \\0 q ) = f j x p(x\0 p ) log 4f|§4dx. 
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Exponential family 
Canonical probability density function: exp{(0, f(x)) — F(0) + C(x)} 


Natural 
parameters 6 


Sufficient 
statistics f(x) 


Cumulant function F(0) 


Dens. Norm. 

C(x) 


Bernouilli B(q) (Tossing coin with Pr(heads) = q and Pr(tails) — 1 — q) 


° 1—q 


X 


log(l + exp0) 





Multinomial M.(q±, ...,qd+i) (Extend Bernouilli with Pr(xj) = qi and Ei9» = 1) 


8i = log ^ 


/i(x) = x t 


Mi + Eti ex P^) 





Beta (3(61,62) (Bernouilli conjugate prior) 


[0i 2 ] T 


[log a; log(l — x)] T 


\ogB( 


0i 


+ 1,02 + 1) 





F(0) = log r( 
(with F(x) = J °° t x ~ l exp(- 


0i+i)r(0 2 +i) 
r(0i+e 2 -2) 

t)dt) = (x - l)T(x - 1)) 


Univariate Normal A/"(/x, <r 2 ) 


[M -11T 

U 2 2<T 2 J 


[x x 2 ] T 


°l , 1 


log(-^) 





Multivariate Normal 7V(/u, S) 


[s-V - is- 1 ] 


[x xx T ] 




-1 


fi+ llogdet(27rS) 





Rayleigh 7Z(cr 2 ) (used in 


ultrasound imageries) 


1 

2<7 2 


x 2 




logo; 


Laplacian C(9) (used in 


radioactivity decay) 




—x 


-log 


9 







Poisson V(X) (counting process) 


log A 


X 


exp 


— log x\ 


Gamma 7(0i, 6 2 ) (waiting times in Poisson processes) 


[61 6 2 f 


[log a; x] T 


io g r(0 1 + i) + (0 2 + i)io g (-0 2 ) 





Dirichlet T>(a) (varying proportion model 


X 


= 1, conjugate prior of Multinomial) 


6i = a,i-l 


/*(*) = logXi 


iogr(E l ^ + ^)-E i r(^ + i) 






Table 2: Canonical decompositions of usual exponential families. 



The following theorem is the extension to the continuous case of a result mentioned in [H] . 



Theorem 2 The Kullback-Leibler divergence of any two distributions of the same expo- 
nential family with natural parameters P and q is obtained from the Bregman divergence 
induced by the cumulant function F as: KL(0 p \\0 q ) = D F (0 q \\0 p ) . 

Before proving the theorem, we note that 



VF(0) 



f (x) exp{<0, f (x)) - F{0) + C{x)}dx 



The coordinates of n = f VF(0) = [ f f (x)p(x\0)dx] = Ee(f(x)) are called the expecta- 
tion parameters. As an example, consider the univariate normal distribution J\f(/J,, a) with 
sufficient statistics [x x 2 ] T (see Table [2]). The expectation parameters are = Vp(0) = 
[/i /i 2 + cr 2 ] T , where \x = J x xp(x\0)dx and /i 2 + cr 2 = J x x 2 p(x\0)dx. 

We now prove the theorem. 

Proof: 

KL(0 p \\0 q ) = j^ x \0 p )\ og &Mdx 

= [ p(x\0 p )(F(0 q ) - F(0 P ) + (0 P - q ,f(x)})dx 

= I p(x\0 p ) (D F (0 q \\0 p ) + (0 q - P , VF(0 p )) + (0 P - q , f (x)» dx 

J X 

= D F (0 q \\0 p )+ [ p(x\0 p )(0 q -0 p ,VF(0 p )-f(x)))dx 

J X 

= D F {0 q \\0 p ) - Jp{x\0 p ){0 q - P , f{x))dx + (0 q - P , VF(0 p )) 
^ D F (0 q \\0 p ) 



□ 



2.4.3 Dual parameterizations and dual divergences 

The notion of dual Bregman divergences introduced earlier and dual parameterizations ex- 
tend naturally to statistical spaces. Since, n = VF(0) (Eq. pj), the convex conjugate of F(0) 
is F*(fjt) = (0, fjt) — F{0) (Eq. [6]). From Lemma |3l we then deduce the following theorem. 



Theorem 3 D F (0 p \\0 q ) = D F *(fi q \\fi p ) where F* denote the convex conjugate of F. 
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Bernouilli dual divergences: Logistic loss/binary relative entropy 


F(0) = k>g(l + exp0) 




f( ) = i+cxpe = /* 


F*(/i) = M lo gAt + (l-/i)log(l- M ) 


D F ,( / i'|| M )=/i'lo g ^ + (l-/i)lo g i^ 


r( f j,) = iog T ^ = e 


Poisson dual divergences: Exponential loss/Unnormalized Shannon entropy 


F(0) = cxp6> 


D F (6\\6') = cxp 6 - cxp 6' - (9 - 6') cxp 6' 


f(0) = cxp 6> = /ii 


F*(n) = filogfj,- fi 


D F ,( M '|| M ) = M 'lo g ^ + M -^ 


/*( M ) = log/z = 



Table 3: Examples of dual parameterizations of exponential families and their corresponding 
Kullback-Leibler (Bregman) divergences for the Bernoulli and Poisson distributions. 



Table [3] presents some examples of dual parameterizations of exponential families (i.e., the 
natural ^-parameters and expectation /x-parameters and dual Legendre cumulant functions), 
and describe the corresponding Bregman divergences induced by the Kullback-Leibler diver- 
gences. 

Finally, we would like to point out that Banerjee et al. [6] have shown that there is a 
bijection between the regular exponential families and a subset of the Bregman divergences 
called regular Bregman divergences. 



3 Elements of Bregman geometry 

In this section, we discuss several basic geometric properties that will be useful when studying 
Bregman Voronoi diagrams. Specifically, we characterize Bregman bisectors, Bregman balls 
and Bregman geodesies. Since Bregman divergences are not symmetric, we describe several 



types of Bregman bisectors in £3.1 We subsequently characterize Bregman balls by using 



a lifting transform that extends a construction well-known in the Euclidean case (£3.2) 



Finally, we characterize geodesies and show an orthogonality property between bisectors and 



geodesies in £3.3 



3.1 Bregman bisectors 

Since Bregman divergences are not symmetric, we can define several types of bisectors. The 
Bregman bisector of the first type is defined as 

H F (p,q) = {x G X | £>f(x|| P ) = £> F (x||q)}. 

Similarly, we define the Bregman bisector of the second type as 

#i-(p,q) = {x e X | D F (p\\x) = D F (q\\x)}. 
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These bisectors are identical when the divergence is symmetric. However, in general, they 
are distinct, the bisectors of the first type being linear while the bisectors of the second type 
are potentially curved (but always linear in the gradient space, hence the notation). More 
precisely, we have the following lemma 

Lemma 4 The Bregman bisector of the first type H F (p, q) is the hyperplane of equation: 

H F (p, q) : (x, p' - q'> + F(p) - (p, p') - F(q) + (q, q'> = 
The Bregman bisector of the second type H' F (p,q) is the hypersurface of equation 

H' F (p, q) : (x\ q - p) + F(p) - F(q) = 
(a hyperplane in the gradient space X '). 

It should be noted that p and q lie necessarily on different sides of H F (p,q) since 
H F (p,q){p) = -D F (p\\q) < and # F (p,q)(q) = D F (q||p) > 0. 



From Lemma [3j we know that D F (x\\y) = _DF*(y'||x') where F* is the convex conjugate of 
F. We therefore have 

H F (p,q) = V^FOH^p')), 
H' f (p,<D = V- l F{HF.(q[,p')). 

Figure [4] depicts several first- type and second- type bisectors for various pairs of primal/dual 
Bregman divergences. 

The bisector H F (p,q) for the symmetrized Bregman divergence S F is given by 

H F (p, q) : (x, q - p) + (x, q - p) + (p, p) - (q, q) = 0. 

Such a bisector is not linear in x nor in x'. However, we can observe that the expression 
is linear in x = [x x'] T . Indeed, proceeding as we did in £2.3. we can rewrite the above 
equation as 

x 
x' 



Hp{p,q) : 



q'-p' 

q- p 



+ (p,p / )-(q ) q') = o. 



which shows that H F (p, q) is the projection on X of the intersection of the hyperplane 
H(p, q) of M. 2d with the c?-dimensional manifold X = {x = [x x'] T | x G X}. 



3.2 Bregman spheres and the lifting map 

We define the Bregman balls of, respectively, the first and the second types as 

B F (c,r) = {x 6 X | D F (x\\c) < r} and B' F (c, r) = {x G X \ D F (c\ |x) < r} 
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Source space X 



Gradient space X' 





Figure 4: Bregman bisectors: first-type linear bisector and second-type curved bisector are 
displayed for pairs of primal/dual Bregman divergences: (a) exponential loss/unnormalized 
Shannon entropy, (b) logistic loss/dual logistic loss, and (c) self-dual Itakura-Saito diver- 
gence. The grid size of M 2 in X and X' is ten ticks per unit. First-type (primal linear/dual 
curved) and second-type (primal curved/dual linear) bisectors are respectively drawn in red 
and blue. 
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(a) (b) (c) 

Figure 5: Bregman balls for the Itakura-Saito divergence. The (convex) ball (a) of the first 
type Bp(c,r), (b) the ball of the second type B' F (c,r) with the same center and radius, (c) 
superposition of the two corresponding bounding spheres. 

The Bregman balls of the first type are convex while this is not necessarily true for the balls 
of the second type as shown in Fig. [5] for the Itakura-Saito divergence (defined in Table [I]). 
The associated bounding Bregman spheres are obtained by replacing the inequalities by 
equalities. 

From Lemma [3j we deduce that 

B' F (c 1 r) = V- 1 F(B F *(c',r)). (9) 

Let us now examine a few properties of Bregman spheres using a lifting transformation that 
generalizes a similar construct for Euclidean spheres (see pTJ| 155]). 

Let us embed the domain XinX = Xx'Rc IR d+1 using an extra dimension denoted by the 
Z-&xis. For a point x G X, recall that x = (x, .F(x)) denotes the point obtained by lifting x 
onto T (see Figure [l]). In addition, write Proj^(x, z) — x for the projection of a point of X 
onto X . 

Let p G X and H p be the hyperplane tangent to T at point p of equation 

z = iJ p (x) = (x-p,p / )+F(p), 

and let denote the halfspace above H p consisting of the points x = [x z] T G X such that 
z > H p (x). Let a(c, r) denote either the first-type or second-type Bregman sphere centered 
at c with radius r (i.e., dBp(c,r) or dB' F (c,r)). 

The lifted image a of a Bregman sphere a is a = {(x, F(x)),x G a}. We associate to a 
Bregman sphere a = <r(c, r) of X the hyperplane 

H <T :z=(x-c,c') + F(c)+r, (10) 

parallel to H c and at vertical distance r from H c (see Figure [6]). Observe that H a coincides 
with H c when r = 0, i.e. when sphere a is reduced to a single point. 



17 



(a) Squared Euclidean distance 



(b) Itakura-Saito divergence 



Figure 6: Two Bregman circles a and the associated curves a obtained by lifting a onto 
T. The curves a are obtained as the intersection of the hyperplane H a with the convex 
hypersurface T . 3D illustration with (a) the squared Euclidean distance, and (b) the Itakura- 
Saito divergence. 



Lemma 5 a is the intersection of T withH a . Conversely, the intersection of any hyperplane 
H with T projects onto X as a Bregman sphere. More precisely, if the equation of H is 
z = (x, a) + b, the sphere is centered at c = V _1 F(a) and its radius is (a, c) — -F(c) + b. 



Proof: The first part of the lemma is a direct consequence of the fact that -D^(x||y) is 
measured by the vertical distance from x to H y (see Lemma [TJ. For the second part, we 



consider the hyperplane H" parallel to H and tangent to T. From Eq. 10, we deduce 
a = c'. The equation of is thus z = (x - V _1 F(a),a) + F(V _1 F(a)). It follows that 
the divergence from any point of a to c, which is equal to the vertical distance between H 
and is (V~ 1 F(a), a) - F(V^ 1 F(a)) + b = (a, c) - F(c) + b. □ 

Bregman spheres have been defined as manifolds of codimension 1 of lR d , i.e. hyperspheres. 
More generally, we can define the Bregman spheres of codimension k+1 of lR d as the Bregman 
(hyper) spheres of some affine space Z C M. d of codimension k. The next lemma shows that 
Bregman spheres are stable under intersection. 



Lemma 6 The intersection of k Bregman spheres ax, . . . , is a Bregman sphere a. If the 
o~i pairwise intersect transversally, a = nf =1 o"j is a k-Bregman sphere. 



Proof: Consider first the case of Bregman spheres of the first type. The k hyperplanes 
H ai , i — 1, . . . , k intersect along an affine space H of codimension k of that vertically 
projects onto G. Let G* = G x R be the vertical flat of codimension k that contains G (and 
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H) and write Tq = T fl G* and H G = H n(?. Note that JF G is the graph of the restriction 
of F to G and that iJ^ is a hyperplane of G^. We can therefore apply Lemma [5] in G$, which 
proves the lemma for Bregman spheres of the first type. 

The case of Bregman spheres of the second type follows from the duality of Eq. |9j □ 



Union and intersection of Bregman balls 

Theorem 4 The union of n Bregman balls has combinatorial complexity G(nL^H) and can 
be computed in time 0(nlogn + nLiH). 

Proof: To each ball, we can associate its bounding Bregman sphere <jj which, by Lemma [5j 
is the projection by Proj^ of the intersection of T with a hyperplane H a .. The points of 
T that are below H ai projects onto points that are inside the Bregman ball bounded by erj. 
Hence, the union of the balls is the projection by Proj^. of the complement of T fl TC where 
TO = n™ =1 ifj.. TO is a convex polytope defined as the intersection of n half-spaces. The 
theorem follows from McMullen's theorem that bounds the number of faces of a polytope [31], 
and Chazelle's optimal convex hull/half-space intersection algorithm [14J. The result for the 
balls of the second type is deduced from the result for the balls of the first type and the 
duality of Eq. |9j □ 

Very similar arguments prove the following theorem (just replace Hi by the complementary 
halfspace H^). 

Theorem 5 The intersection of n Bregman balls has combinatorial complexity B(nL^J) 
and can be computed in time 6(nlogn + nl-^H). 



Circumscribing Bregman spheres. There exists, in general, a unique Bregman sphere 
passing through d + 1 points of Mr. This is easily shown using the lifting map since, in 
general, there exists a unique hyperplanes of M. d+1 passing through d + 1 points. The claim 
then follows from Lemma [U 

Deciding whether a point x falls inside, on or outside a Bregman sphere a G M. d passing 
through d + 1 points of po, Pd will be crucial for computing Bregman Voronoi diagrams 
and associated triangulations. The lifting map immediately implies that such a decision task 
reduces to determining the orientation of the simplex (p , ...,pd,x) of which in turn 

reduces to evaluating the sign of the determinant of the (d + 2) x (d + 2) matrix (see [32J) 



InSphere(x;p , ...,Pd) 



1 

Po 

^(p ; 
. 1 



is non-zero, InSphere(x; p , Pd) is 



If one assumes that the determinant 

Po ••■ Pd 

negative, null or positive depending on whether x lies inside, on, or outside a. 



1 

Pd 



1 
x 

F(x) 
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Figure 7: Generalized Pythagoras' theorem for Bregman divergences: The projection p>v of 
point p to a convex subset W C X. For convex subset W, we have D F (w\ |p) > D F (w\\p w ) + 
-Df(pwIIp) (with equality for and only for affine sets W). 



3.3 Projection, orthogonality and geodesies 

We start with an easy property of Bregman divergences. 

Property 5 (Three-point property) For any triple p,q and r of points of X , we have: 
D F (p\\q) + D F (q\ |r) = D F (p\\r) + (p - q, r' - q') . 

The following lemma characterizes the Bregman projection of a point onto a closed convex 
set W. 



Lemma 7 (Bregman projection) For any p, there exists a unique point x G W i/jai 
minimizes D F (x.\\p). We call this point the Bregman projection of p onto W and denote it 
Pw- 



Proof: If it is not the case, then define x and y two minimizers with D F (pc\ |p) = D F (y\\p) = 
I. Since W is convex, (x+y)/2 G W and, since D F is strictly convex in its first argument (see 



Section^, D F ((x+y)/2||p) < D F (x\ \p)/2 + D F (y\ |p)/2. But D F (x| |p)/2 + ,D F (y| |p)/2 = 
I yielding a contradiction. □ 

We now introduce the notion of Bregman orthogonality. We say that pq is Bregman orthog- 
onal to qr iff D F (p||q) + D F (q||r) = Dp(p||r) or equivalently (by the Three-point property), 
if and only if (p — q, r' — q') = 0. Observe the analogy with Pythagoras' theorem in Eu- 
clidean space (see Figure [7]). Note also that the orthogonality relation is not symmetric: the 
fact that pq is Bregman orthogonal to qr does not necessarily imply that qr is Bregman 
orthogonal to pq. More generally, we say that / C X is Bregman orthogonal to J C X 
(I l~l J 0) iff for any p 6 J and r G J, there exists a q G / D J such that pq is Bregman 
orthogonal to qr. 

Notice that orthogonality is preserved in the gradient space. Indeed, since (p — q, r' — q') = 
(r 7 — q', p — q), pq is Bregman orthogonal to qr iff r'q' is Bregman orthogonal to q'p'. 

Let T F (p, q) be the image by V _1 F of the line segment p'q', i.e. 

IV(p, q) = {x G X : x' = (1 - A)p' + Aq\ A G [0, 1]}. 
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By analogy, we rename the line segment pq as 

A(p, q) = {x G X : x = (1 - A)p + Aq, A G [0, 1]} 

In the Euclidean case {F(x) = ^||x|| 2 ), IV(p, q) = A(p, q) is the unique geodesic path joining 
p to q and it is orthogonal to the bisector Hp(p, q). For general Bregman divergences, we 
have similar properties as shown next. 

Lemma 8 IV(p,q) is Bregman orthogonal to the Bregman bisector Hp(p,q) while A(p,q) 
is Bregman orthogonal to ifp. (p,q). 

Proof: Since p and q lie on different sides of Hp(p,q), IV(p,q) must intersect Hp(p,q)- 
Fix any distinct x G T(p, q) and y G Hp(p, q), and let t G T(p, q) R Hp(p, q). To prove the 
first part of the lemma, we need to show that (y — t, x' — t') = 0. 

Since t and x both belong to G r^(p,q), we have t' — x' = A(p' — q'), for some A 6 K, 
and, since y and t belong to Hp(p,q), we deduce from the equation of Hp(p,q) that 
(y — t,p' — q') = 0. We conclude that (y — t,x' — t') = 0, which proves that r^(p,q) is 
indeed Bregman orthogonal to H F (p, q). 

The second part of the lemma is easily proved by using the fact that orthogonality is preserved 
in the gradient space as noted above. □ 

Figure [8] shows Bregman bisectors and their relationships with respect to A(p, q) and 

rv(p,q). 

We now focus on characterizing Bregman geodesies. First, recall that a parameterized curve 
C between two points p and pi is defined as a set C = {pa}a=o> which is continuous. In 
Riemannian geometry, geodesies are the curves that minimize the arc length with respect 
to the Riemannian metric [TJ |27j . Since embedding X with a Bregman divergence does not 
yield a metric space, we define the following curve lengths: 

4(C) = f D F (p \\p x )dX , (11) 

JA=0 

4(C) = / J D F (p A ||p )dA. (12) 

We now characterize the dual pair of geodesies and their lengths as follows: 

Lemma 9 Curve Yf{po-,Pi) (respectively straight line segment A(po,pi)j minimizes 
Jx =0 -Df(po| |PA)dA (respectively f x=Q Df(p\\\po)(1\) over all curves C = {pa}a=o- 

Proof: For any curve C between po and pi, we measure the 4 length as 4(C) = 
f x Z^i?(pA||po)dA. Fix some inner point p G r^po, Pi)\{po? Pi}- From the three-point 
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Figure 8: Bregman bisectors and their relationships with respect to A(p, q) (straight line 
segments) and IV (p, q) (bold curves), for the Itakura-Saito divergence (left) and I-divergence 
(right). Bold curves become linear in X'; colors depict the Bregman orthogonality relation- 
ships of Lemma [8] 

property (Property [5]), the set of points {y G X | ZV(y||po) = D F (y\ |p) + D F (p\ |po)} is the 
hyperplane H p : (y, h) = b (h is a perpendicular vector to the hyperplane) which splits X into 
two open half-spaces : (y, h) > b, and H~ : (y, h) < b. Now, H p intersects T(p , pi) since 
H p separates p from p x . Indeed, F p (p ) = (p - P, p' - p') = £V(Po| |p) + £V(p| |Po) > 
and Fp(pi) = (p 1 - p,p' - p'} = ^(pi - p, p[ - p') < where p' = Ap' + (1 - A)pi 
(with A g]0, 1[). Therefore any connected path C joining p to pi has to intersect H p . 

To finish up, consider function / : [0, 1] — > C with /(0) = po, /(l) = Pi, and /(A) G C fl H Px 
otherwise, where it is understood that pa is hereafter a point of rV(p ,pi). Since /(A) G 
H pW , we have D F (f(X)\\p ) = D F (f(X)\\p x ) +-Df(pa||Po) > D F (p x \\Po), with equality if 
and only if /(A) = pa- Thus we have 

MMPo,Pi))= / £V(PA||Po)dA < / D F {f{X)\\p )d\<e r (C) . 
Jx=o Jx=o 

The case of A(po,pi) follows similarly from Legendre convex duality. 
□ 

Corollary 1 Since rV(p ,pi) = r F (pi,p ) (respectively, since A(p ,pi) = A(p 1 ,p ) y ) 
we deduce that rV(p ,Pi) minimizes also J A 1 =0 i^^(pi||pA)dA (respectively, minimizes also 
J x=0 D F (p x \\pi)dX) over all curves C = {pa}I =0 - 
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Observe also that IV(p,q) is the unique geodesic path joining p to q in X for the metric 
image by "V~~ X F of the Euclidean metric. 

Finally, we give a characterization of these geodesies in information-theoretic spaces. Recall 
that Banerjee et al. j6] showed that Bregman divergences are in bijection with exponen- 
tial families. This was emphasized by Theorem [2] that proved that the Kullback-Leibler 
divergence of probability density functions of the same exponential family £p is a Bregman 
divergence Dp for the cumulant function F. From this standpoint, A(p,q) and Tp(p,q) 
minimize the total Kullback-Leibler divergence, a characteristic that we choose to call the 
information length of a curve. Since the Kullback-Leibler divergence is not symmetric, this 
justifies for the existence of two geodesies, one which appears to be linear when parame- 
terized with the natural affine coordinate system (6), and the other that is linear in the 
expectation affine coordinate system (n). See also [I]. 

Corollary 2 Suppose p(.\0o) and p(.\0i) are probability density functions of the same expo- 
nential family £p- Then Tf(8 ,6i) (resp. A(0 O , 6 \)) minimizes tv{C) = J x=0 KL(0 o ||0A)dA 
(resp. £a(C) = f x=0 KL(0\\\do)d\) over all curves C = {p(-\0\)}\ =0 - 



4 Bregman Voronoi diagrams 

Let S = {pi, p n } be a finite point set of X C M. d . To each point p» is attached a <i-variate 
continuous function Di defined over X . We define the lower envelope of the functions as the 
graph of min!<j<„ Di and their minimization diagram as the subdivision of X into cells such 
that, in each cell, argmin, fa is fixed. 

The Euclidean Voronoi diagram is the minimization diagram for Dj(x) = ||x — Pi|| 2 . In 
this section, we introduce Bregman Voronoi diagrams as minimization diagrams of Bregman 



divergences (see Figure 10) 



We define three types of Bregman Voronoi diagrams in £4.1 We establish a correspondence 



between Bregman Voronoi diagrams and polytopes in £4.2 and with power diagrams in 



£4.3 These correspondences lead to tight combinatorial bounds and efficient algorithms. 



Finally, in §4.4[ we give two generalizations of Bregman Voronoi diagrams; fc-order and 
/c-bag diagrams. 

We note S' = {Vf(Pi), i = 1, . . . ,n} the gradient point set associated to S. 



4.1 Three types of diagrams 

Because Bregman divergences are not necessarily symmetric, we associate to each site Pi 
two types of distance functions, namely -D;(x) = _Dp(x||pj) and -D^(x) = Dp(pj||x). The 
minimization diagram of the Di, i = l,...,n, is called the first-type Bregman Voronoi 
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diagram of S, which we denote by vorp(«S). The <i-dimensional cells of this diagram are in 
1-1 correspondence with the sites and the ci-dimensional cell of pi is defined as 

vor F (pi) d = {x G X | D F (x||pi) < D F (x.\\pj) Vpj G S.} 

Since the Bregman bisectors of the first-type are hyperplanes, the cells of any diagram of the 
first-type are convex polyhedra. Therefore, first-type Bregman Voronoi diagrams are affine 
diagrams [U |5] . 

Similarly, the minimization diagram of the D[, % = 1, . . . , n, is called the second-type Bregman 
Voronoi diagram of S, which we denote by vor' F («S). A cell in vor' F (iS) is associated to each 
site pt and is defined as above with permuted divergence arguments: 

vor' F (pi) = f {x G X | Dp(pj||x) < D F (pj\\x) Vpj G S.} 

In contrast with the diagrams of the first-type, the diagrams of the second type have, in 
general, curved faces. 

Figure [9] illustrates these Bregman Voronoi diagrams for the Kullback-Leibler and the 
Itakura-Saito divergences. Note that the Euclidean Voronoi diagram is a Bregman Voronoi 
diagram since vor(«S) = vorp(«S) = vor^(<S) for -F(x) = ||x|| 2 . 

For asymmetric Bregman divergences Dp, we can further consider the symmetrized Bregman 
divergence S F = Dp and define a third-type Bregman Voronoi diagram vor'^(<S). The cell of 
vor'^(iS) associated to site pi is defined as: 

vor'^(pi) = f {x G X | 5p(x,Pi) < S F (x,Pj) Vpj G S.} 

From the Legendre duality between divergences, we deduce correspondences between the 
diagrams of the first and the second types. As usual, F* is the convex conjugate of F. 

Lemma 10 vor' i? («S) = V- l F{vor F *{S')) and vor F {S) = V^vor^. (£'))■ 



Proof: By Lemma |3j we have Djr(x||y) = £)p. (y'||x'), which gives vor^(pj) = {x G 
X | Df^p'Mx.') < D^(p;.||x') V P ;. G S'} = V- 1 F(vor , F .(tf i )). The proof of the second 
part follows the same path. □ 

Hence, constructing the second-type curved diagram vor' F (<S) reduces to constructing an 
affine diagram in the gradient space X' (and map the cells by VF _1 ). 

Let us end t his section by considering the case of symmetrized Bregman divergences in- 



troduced in ^2.3 SV(p,q) = -D^fpJJq) = Dp(q||p) where F is a 2<i-variate function and 



x x'] . As already noted in £2.3, x lies on the ci-manifold X — {[x x'] T | x G R d }. It 



follows that the symmetrized Voronoi diagram vor F (S) is the projection of the restriction 
to X of the affine diagram vor F (S) of M. 2d where S = {p«,Pi G S}. Hence, computing the 
symmetrized Voronoi diagram of S reduces to: 
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(b) 

Figure 9: Three types of Bregman Voronoi diagrams for (a) the Kullback-Leibler and (b) 
the Itakura-Saito divergences. First-type affine Bregman Voronoi diagram (red), second-type 
Bregman Voronoi diagram (blue) and symmetrized Bregman Voronoi diagram (green). 

1. computing the first-type Bregman Voronoi diagram vorp(S) of M. 2d , 

2. intersecting the cells of this diagram with the manifold X, and 

3. projecting all points of voip(S) D X to X by simply dropping the last d coordinates. 



4.2 Bregman Voronoi diagrams from polytopes 



Let H pv i — 1, . . . , n, denote the hyperplanes of X defined in £3.2 For any x e X, we have 
following Lemma [T] 



tf Pi (x)>tf Pj (x). 



D F (x\\pi) < D F (x||pj) <; 
The first-type Bregman Voronoi diagram of S is therefore the maximization diagram of the n 



linear functions if p .(x) whose graphs are the hyperplanes H Pi (see Figure 10). Equivalently, 
we have 



Theorem 6 The first-type Bregman Voronoi diagram votf(S) is obtained by projecting by 
Proj^ the faces of the (d + 1)- dimensional convex polyhedron Ti = Di-fff of X + onto X . 



From McMullen's upperbound theorem [31] and Chazelle's optimal half-space intersection 
algorithm [T3] , we know that the intersection of n halfspaces of W 1 has complexity (n L 
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Squared Euclidean distance 





(d) 



Figure 10: Voronoi diagrams as minimization diagrams. The first row shows minimization 
diagrams for the Euclidean distance and the second row shows minimization diagrams for the 
Kullback-Leibler divergence. In the first column, the functions are the non-linear functions 
.Dj(x) and, in the second column, the functions are the linear functions if p . (x), both leading 
to the same minimization diagrams. Isolines are shown in green. 
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and can be computed in optimal-time G(ralogn + n^-l) for any fixed dimension d. From 
Theorem [6] and Lemma [lUJ we then deduce the following theorem. 

Theorem 7 The Bregman Voronoi diagrams of type 1 or 2 of a set of n d- dimensional 
points have complexity Q{p\~^) and can be computed in optimal time 0(nlogn + n^~^). 
The third-type Bregman Voronoi diagram for the symmetrized Bregman divergence of a set 
of n d-dimensional points has complexity 0(n d ) and can be obtained in time 0{n d ). 

Apart from Chazelle's algorithm, several other algorithms are known for constructing the 
intersection of a finite number of halfplanes, especially in the 2- and 3-dimensional cases. 
See [TO! E] for further references. 



4.3 Bregman Voronoi diagrams from power diagrams 

The power distance of a point x to a Euclidean ball B = B(p, r) is defined as | |p — x| | 2 — r 2 . 
Given n balls Bi = B(jpi, rj), i = l,...,n, the power diagram (or Laguerre diagram) of 
the Bi is defined as the minimization diagram of the corresponding n functions -D«(x) = 
||pi — x|| 2 — r 2 . The power bisector of any two balls 5(pj,rj) and B(pj,rj) is the radical 
hyperplane of equation 2 (x, pj — p$) + | |pi|| 2 — | |cjj| | 2 + r 2 — rf = 0. Thus power diagrams are 
affine diagrams. In fact, as shown by Aurenhammer [21 [10], any affine diagram is identical 
to the power diagram of a set of corresponding balls. In general, some balls may have an 
empty cell in their power diagram. 

Since Bregman Voronoi diagrams of the first type are affine diagrams, Bregman Voronoi 
diagrams are power diagrams [31 EI] in disguise. The following theorem makes precise the 



correspondence between Bregman Voronoi diagrams and power diagrams (see Figure 11). 



Theorem 8 The first-type Bregman Voronoi diagram of n sites is identical to the power 
diagram of the n Euclidean spheres of equations 



(x - p^ x - pj) = pj) + 2(F(pi) - (pi, p-)), 



,71. 



Proof: We have 

Df^WPi) < D F (x.\\pj) 

^ -F(pi) - (x - Pi , P j) < -F( Pj ) - (x - Pj , P ;.) 

Multiplying twice the last inequality, and adding (x, x) to both sides yields 

(x,x) -2(x, P ;> - 2F(p i ) + 2(p ijP ;) < (x,x) -2(x, P ;.) -2F( Pj ) + 2(p j ,p' j ) 
(x - Pi, x - Pi) -r i < (x - Pj , x - pj) - r jt 
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where r? = (p^pj) + 2(F( Pi ) - (p <)P J)) and rf = (p^) + 2(F( Pj ) - (p^-)). The last 
inequality means that the power of x with respect to the Euclidean (possibly imaginary) ball 
^(Pi> r i) is no more than the power of x with respect to the Euclidean (possibly imaginary) 
ball B(p' prj ). □ 

As already noted, for -F(x) = ^||x|| 2 , vorp(«S) is the Euclidean Voronoi diagram of S. Ac- 
cordingly, the theorem says that the centers of the spheres are the Pi and rf = since 
p'i = pj. Figure 11 displays affine Bregman Voronoi diagram^] and their equivalent power 



diagrams for the squared Euclidean, Kullback-Leibler and exponential divergences. 

Note that although the affine Bregman Voronoi diagram obtained by scaling the divergence 
Dp by a factor A > does not change, the equivalent power diagrams are not strictus senso 
identical since the centers of corresponding Euclidean balls and radii are mapped differently. 
See the example of the squared Euclidean distance depicted in Figure [Tl](a) . Since Power 
diagrams are well defined "everywhere" , this equivalence relationship provides a natural way 
to extend the scope of definition of Bregman Voronoi diagrams from X C R d to the full 
space lR d . (That is, Bregman Voronoi diagrams are power diagrams restricted to X.) 

To check that associated balls may be potentially imaginary, consider for example, the 
Kullback-Leibler divergence. The Bregman generator function is F(pc) = Y2i x i ^°S x i an d the 
gradient is VF(x) = [logxi ... \ogXd] T - A point p = [pi ... pd] T G X maps to a Euclidean 
ball of center p' = [logpi ... \ogpd] T with radius r 2 = X^(l°g 2 Pi ~ %Pi)- Thus for points p 
with coordinates pi > ^logpf for i G {1, ■■■,(£}, the squared radius r 2 , is negative, yielding 
an imaginary ball. See Figure |TT|b). 

It is also to be observed that not all power diagrams are Bregman Voronoi diagrams. Indeed, 
in power diagrams, some balls may have empty cells while each site has necessarily a non 



empty cell in a Bregman Voronoi diagram (See Figure 11 and Section 4.4 for a further 
discussion at this point). 

Since there exist fast algorithms for constructing power diagrams |36j . Theorem [8] provides 
an efficient way to construct Bregman Voronoi diagrams. 



4.4 Generalized Bregman divergences and their Voronoi diagrams 
Weighted Bregman Voronoi diagrams 

Let us associate to each site p« a weight Wi G M.. We define the weighted divergence between 
two weighted points as WD F (pi\\pj) = f _Dp(pj| |p.,) + Wi — Wj. We can define bisectors 
and weighted Bregman Voronoi diagrams in very much the same way as for non weighted 
divergences. The Bregman Voronoi region associated to the weighted point (p«, Wi) is defined 
as 

vorp(pi, Wi) = {x G X | D F (x\\pi) + w { < D F (x\\pj) + Wj Vpj G S}. 



See Java™ applet at http : //www . csl . sony . co . jp/person/nielsen/BVDapplet/| 
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Affine Bregman Voronoi diagram 




Equivalent Power diagram 




(a) Squared Euclidean distance (F(x) 





(b) Kullback-Leibler divergence (F(x) = x i l°g 





(c) Exponential loss divergence {F(x) = Y2i ex P x i 



Figure 11: Affine Bregman Voronoi diagrams (left column) can be computed as power dia- 
grams (right column). Illustrations for the squared Euclidean distance (a), Kullback-Leibler 
divergence (b), and exponential divergence (c). Circles are drawn either in grey to denote 
positive radii, or in red to emphasize imaginary radii. Observe that although some cells 
of the power diagrams may be empty, all cells of the affine Bregman Voronoi diagram are 
necessarily non-empty. 
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Observe that the bisectors of the first-type diagrams are still hyperplanes and that the 
diagram can be obtained as the projection of a convex polyhedron or as the power diagram 
of a finite set of balls. The only difference with respect to the construction of Section |4.2| 
is the fact that now the hyperplanes H Pi are no longer tangent to T since they are shifted 
by a ^-displacement of length Wi. Hence Theorem [7] extends to weighted Bregman Voronoi 
diagrams. 

Theorem 9 The weighted Bregman Voronoi diagrams of type 1 or 2 of a set of n 
d-dimensional points have complexity 0(riL~J) and can be computed in optimal time 
G(nlogn + n^^H). 

&;-order Bregman Voronoi diagrams 

We define the fc-order Bregman Voronoi diagram of n punctual sites of X as the subdivision 
of X into cells such that each cell is associated to a subset T C S of k sites and consists of 
the points of X whose divergence to any site in T is less than the divergence to the sites not 
in T. Similarly to the case of higher-order Euclidean Voronoi diagrams, we have: 

Theorem 10 The k-order Bregman Voronoi diagram ofn d-dimensional points is a weighted 
Bregman Voronoi diagram. 

Proof: Let S\, «S 2 , . . . denote the subsets of k points of S and write 

D ^ = I E d ^\\pj) 

= F ^-l T, F ^) + l E < x -p;>p;-> 

= F(x) -F(ci) - (x-CfccJ) +w, 
= WD F {x\\ Ci ) 

where c« = V" 1 ^ f | SjeS; P'j) an< ^ the we ight associated to Cj is W{ = F(ci) — (c^ c£) — 

I Y.,. ,Ai--(v,) • (p,-i>:))- 

Hence, «Sj is the set of the k nearest neighbors of x iff -Dj(x) < Dj(x) for all j or, equivalently, 
iff x belongs to the cell of c, in the weighted Bregman Voronoi diagram of the Cj. □ 

A>bag Bregman Voronoi diagrams 

Let F±, be k strictly convex and differentiable functions, and cx = [a\ ... ak] T € a 

vector of positive weights. Consider the d-variate function F a = Y^i=i a i^i- By virtue of the 
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positive additivity property rule of Bregman basis functions (Property [3]), D Fa is a Bregman 
divergence. 

Now consider a set S = {pi,...,p n } of n points of IR d . To each site p i; we associate a 

weight vector cxi = [af^ ... af^Y inducing a Bregman divergence Dp a .(x||pj) = f D ai (x||pj) 
anchored at that site. Let us consider the first-type of /c-bag Bregman Voronoi diagram 
(/c-bag BVD for short). The first-type bisector K F (pi,pj) °f tw° weighted points (p*, a*) 
and (pj,oij) is the locus of points x at equidivergence to Pi and pj. That is, K F {p^pj) = 
{x G A? | D cti (x||pi) = Z) a .(x||pj)}. The equation of the bisector is simply obtained using 
the definition of Bregman divergences (Eq. [I]) as 



F ai (x) - F ai ( Pi ) - (x - Pi , VF tti ( Pi )) = F Qj (x) - F a .(pj) - (x - Pi , VF a ,( Pj )). 
This yields the equation of the first-type bisector K F (pi, Pj) 



^(af - af)F,(x) - (x, VF„.( Pj ) - VF Qi ( P! )) + c = 0, 



(13) 



i=i 



where c is a constant depending on weighted sites ( P i, a») and (pj,acj). Note that the 
equation of the first-type fc-bag BVD bisector is linear if and only if <Xj = <y.j (i.e., the case 
of standard BVDs). 

Let us consider the linearization lifting x i— > x = [x -Fi(x) ... Fk(x)] T that maps a point 
xGR d into a point x in IR d+fc . Then Eq. 13 becomes linear, namely (x, a) + c = with 



That is, first-type bisectors of a A;-bag BVD are hyperplanes of IR d+fc . Therefore the com- 

I k-\-a I 

plexity of a /c-bag Voronoi diagram is at most 0(?i,L^~J) ; since it can be obtained as the 
intersection of the affine Voronoi diagram in M. d+k with the convex d- dimensional submani- 
fold {x = [x Fi(x) ... F fc (x)] T | x G R d }. 



Theorem 11 The k-bag Voronoi diagram (for k > 1) on a bag of d-variate Bregman diver- 
gences of a set ofn points ofW 1 has combinatorial complexity 0(n^-^2~^) and can be computed 
within the same time bound. 



Further, using the Legendre transform, we define a second-type (dual) /c-bag BVD. We have 
VF a = Yli=i ociVFi and F* = / V^ 1 . (Observe that F* ^ ^, =1 aiF* in general.) 

/c-bag Bregman Voronoi diagrams are closely related to the anisotropic diagrams of Labelle 
and Shewchuk [27] that associate to each point x G X a metric tensor M x which tells how 
lengths and angles should be measured from the local perspective of x. Labelle and Shewchuk 
relies on a deformation tensor (ideally defined everywhere) to compute the distance between 
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any two points p and q from the perspective of x as d x (p, q) = a/(p — q) T M x (p — q). 
Let d x (p) = d x (x, p). The anisotropic Voronoi diagram, which approximates the ideal but 
computationally prohibitive Riemannian Voronoi diagram, is defined as the arrangement of 
the following anisotropic Voronoi cells: 

Vor( Pi ) = {xG X | rf Pl (x) < d p .(x) Vj G {l,...,n}}, Vz G {l,...,n}. 

It follows that all anisotropic Voronoi cells are non-empty as it is the case for fc-bag Bregman 
Voronoi diagrams. 

Hence, the site weights of a k-b&g Bregman Voronoi diagram sparsely define a tensor di- 
vergence that indicates how divergences should be measured locally from the respective 
bag of divergences. Noteworthy, our study of fc-bag Bregman Voronoi diagrams shows 
that the anisotropic Voronoi diagram also admits a second-type anisotropic Voronoi dia- 
gram, induced by the respective dual Legendre functions of the Bregman basis functions 
of the quadratic distance monomials. The Legendre dual of a quadratic distance function 
c?m(p, q) = (p~ q) T M(p — q) induced by positive-definite matrix M is the quadratic distance 
di&-i. (Matrix M is itself usually obtained as the inverse of a variance-covariance matrix S 
in so-called Mahalanobis distances.) 



5 Bregman triangulations 

Consider the Euclidean Voronoi diagram vor(«S) of a finite set S of points of R d (called sites). 
Let / be a face of vor(S) that is the intersection of k (i-cells of vor(«S). We associate to / a 
dual face /*, namely the convex hull of the sites associated to the subset of cells. If no subset 
of d + 2 sites lie on a same sphere, the set of dual faces (of dimensions to d) constitutes a 
triangulation embedded in M. d whose vertices are the sites. This triangulation is called the 
Delaunay triangulation of S, noted del(«S). The correspondence defined above between the 
faces of vor(5) and those of del(<S) is a bijection that satisfies: / C g =>- g* C /*. We say 



that del(iS) is the geometric dual of vor(«S). See Figure 12 

A similar construct is known also for power diagrams. Consider the power diagram of a finite 
set of balls of R d . In the same way as for Euclidean Voronoi diagrams, we can associate a 
triangulation dual to the power diagram of the balls. This triangulation is called the regular 
triangulation of the balls. The vertices of this triangulation are the centers of the balls whose 
cell is non empty. 

We derive two triangulations from Bregman Voronoi diagrams. One has straight edges and 
captures some important properties of the Delaunay triangulation. However, it is not always 
the geometric dual of the corresponding Bregman Voronoi diagram. The other one has 
curved (geodesic) edges and is the geometric dual of the Bregman Voronoi diagram. 
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Figure 12: Ordinary Voronoi diagram (red) and geometric dual Delaunay triangulation 
(blue). 



5.1 Bregman Delaunay triangulations 

Let S be the lifted image of S and let T be the lower convex hull of S, i.e. the collection 
of facets of the convex hull of S whose supporting hyperplanes are below S. We assume in 
this section that S is in general position if there is no subset of d + 2 points lying on a same 
Bregman sphere. Equivalently (see Lemma [5]), S is in general position if no subset of d + 2 
points pi lying on a same hyperplane. 

Under the general position assumption, each vertex of Ti — fljii^ is the intersection of 
exactly d+1 hyperplanes and the faces of T are all simplices. Moreover the vertical projection 
of T is a triangulation delp(«S) = Proj^(T) of S embedded in X C M. d . Indeed, since 
the restriction of Proj^ to T is bijective, delp(«S) is a simplicial complex embedded in 
X. Moreover, since F is convex, delp(<S) covers the (Euclidean) convex hull of S, and 
the set of vertices of T consists of all the p\. Consequently, the set of vertices of deli? (S) 



is S. We call delp(«S) the Bregman Delaunay triangulation of S (see Fig. 13). When 
-F(x) = ||x|| 2 , delp(«S) is the Delaunay triangulation dual to the Euclidean Voronoi diagram. 
This duality property holds for symmetric Bregman divergences (via polarity) but not for 
general Bregman divergences. 

We say that a Bregman sphere a is empty if the open ball bounded by a does not contain 
any point of S. The following theorem extends a similar well-known property for Delaunay 
triangulations whose proof (see, for example [!(]]) can be extended in a straightforward way 



to Bregman triangulations using the lifting map introduced in Section 3.2 



Theorem 12 The first-type Bregman sphere circumscribing any simplex of delp(5) is 
empty. delp(«S) is the only triangulation of S with this property when S is in general position. 

Several other properties of Delaunay triangulations extend to Bregman triangulations. We 
list some of them. 
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Figure 13: Bregman Delaunay triangulation as the projection of the convex polyhedron T. 

Theorem 13 (Empty ball) Let v be a subset of at most d+1 indices in {1, . . . ,n}. The 
convex hull of the associated points Pi, i G v , is a simplex of the Bregman triangulation of 
S iff there exists an empty Bregman sphere o passing through the Pi, i Ev. 

The next property exhibits a local characterization of Bregman triangulations. Let T(S) be 
a triangulation of S. We say that a pair of adjacent facets fx = (/, Pi) and f 2 = (/, P2) 
of T(S) is regular iff pi does not belong to the open Bregman ball circumscribing f 2 and 
P2 does not belong to the open Bregman ball circumscribing fx (the two statements are 
equivalent for symmetric Bregman divergences). 

Theorem 14 (Locality) Any triangulation of a given set of points S (in general position) 
whose pairs of facets are all regular is the Bregman triangulation of S. 

Let S be a given set of points, deli?(iS>) its Bregman triangulation, and T(S) the set of all 
triangulations of S. We define the Bregman radius of a ci-simplex r as the radius noted r(r) 
of the smallest Bregman ball containing r. The following result is an extension of a result 
due to Rajan for Delaunay triangulations [37J. 

Theorem 15 (Optimality) We have deLv(iS) = min^gT-^) max rg j'r(r). 
The proof mimics Rajan's proof [37] for the case of Delaunay triangulations. 



5.2 Bregman geodesic triangulations 



We have seen in Section 4J3 that the Bregman Voronoi of a set of points S is the power 
diagram of a set of balls £>' centered at the points of S' (Theorem [8J. Write reg F (£>') for the 
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(b) 

Figure 14: First-type Kullback-Leibler Bregman Voronoi diagram (a) obtained from the 
corresponding power diagram (b), and its associated dual regular triangulation rooted at 
gradient vertices (blue). 

dual regular triangulation dual to this power diagram. This triangulation^] is embedded in 



X' and has the points of S' as its vertices (see Figure 14). The image of this triangulation by 
V _1 -F is a curved triangulation whose vertices are the points of S. The edges of this curved 



triangulation are geodesic arcs joining two sites (see Section 3.3). We call it the Bregman 



geodesic triangulation of S, noted del' F (iS) (see Figure 15). 



Theorem 16 The Bregman geodesic triangulation del' F (S) is the geometric dual of the 1st- 
type Bregman Voronoi diagram of S. 

Proof: We have, noting = for the dual mapping, and using Theorem [s] 

vot f {S) = pow(fi') ee reg(S') = VF{del' F {S)). 

□ 

Observe that del^(«S) is, in general, distinct from delp(«S), the Bregman Delaunay triangu- 
lation introduced in the previous section. However, when the divergence is symmetric, both 
triangulations are combinatorially equivalent and dual to the Bregman Voronoi diagram of 
S. Moreover, they coincide exactly when F is the squared Euclidean distance. 



6 Applications 

In this section, we give some applications related to computational geometry and machine 
learning. 

4 Applet at http://www.csl.sony.co.jp/person/nielsen/BVDapplet/ 
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(a) Ordinary Delaunay 



(b) Exponential loss 



(c) Hellinger-like divergence 



Figure 15: An ordinary Delaunay triangulation (a) and two Bregman geodesic triangulations 
for the exponential loss (b) and for the Hellinger-like divergence (c). 



6.1 Centroidal Bregman Voronoi diagrams and Lloyd quantization 

Let D be a domain of X and p(x) be a density function defined over T>. We define the 
Bregman centroid of D as the point c* e T> such that c* = argmin ceX , J x&v p(x)Df(x.\\c) dx. 
The following lemma states that the mass Bregman centroid of T> is uniquely defined and 
independent of F. 

Lemma 11 The Bregman centroid of T> coincides with the mass centroid ofT>. 
Proof: 

V c / p(x) D F (x||c)dx 



V c / p(x) (F(x) - F(c) - (x - c, VF(c)»dx 



p(x) V 2 F(c)(x-c)dx 



xex> 

2; 



-V F(c)( / p(x) xdx - c / p(x) dx 



Hence, c* = If^Jp. □ 

When x is a random variable following the probability density p(x), J xgX , p(x) £)^(x| |c) dx is 
called the distortion rate associated to the representative c, the optimal distortion-rate func- 
tion J xgX ,p(x) Dp(pc\\c*) dx is called the Bregman information, and c* is called the Bregman 
representative. The above result states that the optimal distortion rate exists and does not 
depend on the choice of the Bregman divergence, and that the Bregman representative c* 
is the expectation E(x) of x. This result extends an analogous result in the discrete case 
(finite point sets) studied in [6]. 



36 



Computing a centroidal Bregman Voronoi diagram of k points can be done by means of 
Lloyd's algorithm [30]. We select an initial set of k points. Then, we iteratively compute a 
Bregman Voronoi diagram and move the sites to the Bregman centroids of the corresponding 
cells in the diagram. Upon convergence, the output of the algorithm is a local minimizer 
of f((Pi,Vi),i = l,...,k) = Yh=i J xeVi D F (x\\pi) dx , where {Pi}f = i denotes any set of k 
points of X and {l^}^ =1 denotes any tesselation of X into k regions. See [H] for a further 
discussion and applications of centroidal Voronoi diagrams. 



6.2 e-nets 



Lloyd's algorithm intends to find a best set of k points for a given k so as to minimize a 
least-square criterion. Differently, we may want to sample a compact domain T> C X up to a 
given precision while minimizing the number of samples. Instead of a least-square criterion, 
we define the error associated to a sample P as error(P) = max xe x> min p . e p Dp(x\ |p$). A 
finite set of points P of T> is an e-sample of T> iff error(P) < e. 

An e-sample P is called an e-net if it satisfies the sparsity condition: 
max(Dp(p\\q), Dp(q\\p)) > e for any two points p and q in P. 

We will see how to construct an e-net. For simplicity, we assume in the rest of the section 
that T> is a convex polytope. Extending the results to more general domains is possible. 

Let P cT>, vorp(P) be the Bregman Voronoi diagram of P and vorp|x>(P) be its restriction 
to T>. Write V for the set of vertices of voyf\v{P)- V consists of vertices of vorp(P) and 
intersection points between the edges of vorp(P) and the boundary of T>. The following 
lemma states that error(P) can be computed by examining only a finite number of points, 
namely the points of V. 



Lemma 12 error(P) = max ve y min p . e pPp(x||pj). 



Proof: Let x G V, p x the point of P closest to x and the associated cell of vor F p(P) 
(which contains x). V x is a bounded polytope whose vertices belongs to V. Let w be the 
vertex of most distant from p x . We have Pp(x||p x ) < Pp(w||p x ). This is a consequence 
of the convexity of F and of the fact that Pi?(x||p) is measured by the vertical distance 
between x and H p (Lemma [T]). □ 

An e-net of T> can be constructed by the following greedy algorithm originally proposed by 
Ruppert in the context of mesh generation [39]. See also [20]. We initialize the sample set 
Po with d points of T> lying at distance greater than e from one another. Then, at each step, 
the algorithm looks for the point Vj of T> that is the furthest (for the considered Bregman 
divergence) from the current set of samples P«. By Lemma 12 this step reduces to looking at 
the vertices of vor F\v{Pi) ■ If -D-F(x||vj) < e, the algorithm stops. Otherwise, we take v« as a 
new sample point, i.e. pj + i = Vj, we update the set of sample points, i.e. p + i = pU {pi+i}, 
and insert p^+i in the Bregman Voronoi diagram of the sample points. Upon termination, the 
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set of sample points P t satisfies the hypothesis of Lemma 12 and therefore P t is an e-sample 
of T>. Moreover, for any two points p and q of Pt, we have Z}p(p||q) > e or Dp(q||p) > e, 
depending on whether p has been inserted after or before q. Indeed, we only insert a point 
if its divergence to the points of the current sample is greater than e. Hence, Pt is an e-net 
of V. 

To prove that the algorithm terminates, we need the following lemma. Given a Bregman ball 
B(c,r), we define the biggest Euclidean ball EB(c,r') contained in B(c,r) and the smallest 
Euclidean ball EB(c,r") containing B(c,r). 

Lemma 13 Let P be a strictly convex function of class C 2 , there are constants 7' and 7" 
(that do not depend on c nor on r) such that r' 2 > 7V and r" 2 < 7'V. 

Proof: According to Taylor's formula, there exists a point t of the open segment xc such 
that 

F(x) = F(c) + (x - c, VF(c)> + - (x - c) T V 2 F(t)(x - c). 

Hence, 

D F (-x\\c) = F(x) - F(c) - (x - c, c'> = i (x - c) T V 2 F(t)(x - c), (14) 
where t is a point of the open segment xc. 

Since F is strictly convex, the Hessian matrix is positive definite (i.e., x T V 2 F(t)x > for 
all x in X), and the domain T> being compact, there exist two constants rf and rj" such that, 
for any y G T>, < r/" < ||V 2 -F(y)|| < 77'. If ||x — c|| 2 > % (Frobenius matrix norm), we 



deduce from Equation (14) that £)_p(x||c) > r. Therefore, B(c,r) C EB(c, v „ 



If || x — c|| < =7, we have using again Equation (|14|) 



L> F (x||c) < \ l|x-c|| 2 < r. 



2 



Therefore, EB(c, ^) C B(c,r). □ 



Let p and q be two points such that _Dp(p||q) = r. Observing that EB(p,r') C EB(p, ||p 
q||) C EB(p,r"), we deduce from the above lemma that 



'7'r < ||p — q|| < y7"r (15) 
and 

^D F (p\\q) < D F (q\\p) < ^-D F {p\\q). 

Another consequence of the lemma is that the volume of any Bregman ball of radius at least 
r > 0, is bounded away from (when F is of class C 2 ). Hence, since T> is compact, the 
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algorithm cannot insert infinitely many points and therefore terminates. Moreover, the size 
of the sample output by the algorithm can be bounded, as stated in the next lemma. Write 
V^ e = {x| 3y G V, ||x - y || < e}. 

Lemma 14 // F is of class C 2 , the algorithm terminates. If P t denotes the final set of 
sample points, we have \P t \ = O ( ^ffi j . 

Proof: We have already shown that the algorithm terminates. Let P t be the set of points 
that have been inserted by the algorithm, excluding the initial set (of constant size). Let 
r(x) = inf{r : \EBfa r) [)P t \ > 2} and B p = EB(p, ^), p G P t . It is easy to see that r 
is 1-Lipschitz and that the Euclidean balls B p , p G Pt are disjoint. Let q be a point of Pt 



closest to p : r(p) = ||p — q|| and, as noticed above, max(.Dp(p||q), D F (q\\p)) > e. Eq. 15 
then implies that r(p) = ||p — q|| > y/j* e. Consider now the midpoint m of pq and write 
t for the point of Pt that minimizes DF(m\\.). Since T> is convex, m G T> and, according to 



the definition of q, ||m — p|| < ||m — t||. Eq. 15 and the fact that P t is an e-sample of T> 



then yield ||m — t|| < s. In summary, we have 



'is <r(p) = ||p-q|| < 2^7" £■ (16) 

The right inequality shows that all the balls B p , p G P t , are contained in T>- v where rj = 
a/7" e. We can now bound the size of P t . 

Iv<v 7%) ^ S p6 P t $B p nv<v ( tlie balls B p have disjoint interiors) 

> E pe p VJ ^p ^) * r(p) + ||p - x|| < § r(p)) 



> u I P I 
^ yi | -ft | 

where C = § if d = 2p and C = ^^ffi^ if d = 2p - 1. 



Using again the Lipschitz property of r and Eq 16 we have for all x G B p 



r(x) > r(p) - ||x - p|| > ^ r(p) > ^ a/t 7 ! 



We deduce 

□ 

A geometric object O is said a-fat [7] if the ratio ^ of the radius r + of the smallest ball 
enclosing O over the radius r~ of the largest ball inscribed in O is bounded by a: ^ < a. 
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Euclidean balls are therefore 1-fat, namely the fattest objects. It has been shown that 
considering the fatness factor for a set of objects yields in practice efficient tailored data- 
sensitive algorithms [7J by avoiding bad configurations of sets of skinny objects. A direct 
consequence of Lemma |i~3| is that Bregman balls (in fixed dimensions) are fat (i.e., a = 0(1)) 
on any compact domain: 

Corollary 3 For C 2 Bregman generator functions, Bregman balls on any compact domain 
are fat. 

Proof: Indeed, consider any Bregman ball defined on a compact domain for a C 2 strictly 
convex and differentiable Bregman generator function F. Its fatness a is upper bounded by 

Recall 



T 

v" ' 



. ,, , where 7' and 7" are the two constants (depending on F and V) of Lemma 



13 



that Lemma 13 considers concentric Euclidean balls ham sandwiching a Bregman ball, all 
centered at position c. We have a<^<^<% = 0(1) since r~ < r~ and r+ > r + , 
where r+ (respectively, r~) denote the radius of the smallest enclosing (respectively, largest 
inscribed) Euclidean ball centered at c. The fatness property simply means that we can 
cover any Bregman ball by a constant number of (convex) Euclidean balls. □ 

Thus, since Bregman balls are fat on compact domains, we can build efficient data-structures 
for point location with applications to piercing (geometric O-transversal) and others, as 
described in |19|. 



6.3 VC-dimension, classification and learning 

Some important classification rules rely on Voronoi diagrams; furthermore, the analysis of 
classification rules (complexity or statistical generalization) sometimes makes use of concepts 
closely related to Voronoi diagrams. Extending the rules and analyses to arbitrary Bregman 
divergences, with important related consequences (such as the eventual lost of convexity) is 
thus particularly interesting for classification, and we review here some notable consequences. 

In supervised classification, we are generally interested in capturing the joint structure of X 
and a set of classes, {0, 1} in the simplest case. For this objective, we build representations 
of concepts, i.e. functions that map X to the set of classes. A concept class Ti is a set of 
concept representations h : X — > {0, 1}; for example, should h be a Bregman ball, it would 
classify the points outside the ball, and 1 the points inside. Armed with these definitions, 
our supervised classification problem becomes the following one. A so-called target concept, 
c, which is unknown, labels the points of X; we have access to its labeling throughout a 
sampling process: we retrieve examples (i.e., pairs (x,c(x))), independently at random, 
according to some unknown but fixed distribution T> over the set {(x,c(x)) : x £ X}. The 
question is: what are the conditions on Ti that guarantee the possibility to build, within 
reasonable time, some h £ TC agreeing as best as possible with c, with high probability? 
While the complexity requirement is usual in computer science, the fact that we require 
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adequacy with high probability better than systematically is also a necessary requirement, 
as there is always the possibility of an extremely bad sampling that would prevent any 
efficient learning (e.g. we have drawn the same example all the time). In general, rather 
than directly sampling the domain, we work with a finite data set S of examples which is 
supposed to be sampled this way. 

From the statistical standpoint, learning requires to find a good balance between the accu- 
racy, i.e. the goodness-of-fit of h as measured on S, and the capacity of TC, i.e. its ability 
to learn (or fit in generalization) the data with the smallest number of errors. Consider for 
example geometric figures in the plane and the "square" concept. Intuitively, an TC with too 
large capacity is like the person who picks a huge quantity of geometric figures including 
squares, memorizes each of them, and then rejects every square that would not exactly be in 
its collection (edge lengths, colors, etc.). An TC with too little capacity is like the lazy person 
who keeps as sole concept the fact that squares have four edges. Both extremal situations 
mean little generalization capabilities, but for different reasons. 

There have been intensive lines of works on the measures of this capacity, and one of the 
most popular is the VC-dimension [T7j. Informally, the VC-dimension of TC is the size of 
the largest dataset S for which TC shatters S, i.e. for which TC contains all the classifiers 
that could perform any of the 2' 15 ' possible labelings of the data. To be more formal, let 
]Tft(iS>) = {(h(pi), h(p 2 ), h(p n j) | h G TC} denote the set of all distinct tuples of labels on 
S that can be performed by elements of TL. While it always holds that ri-^(n) < 2™, the 
maximal n for which ITft(n) = 2 n is the VC-dimension of TC, VCdim(7Y). The importance of 
the VC-dimension comes from the fact that it allows to bound the behavior of the empirical 
optimal classifier in a distribution-free manner [T7j. In particular, if the VC-dimension is 
finite, the average error probability of the empirical optimal classifier tends to when the 
size of the training data set increases. The following lemma proves that the VC-dimension 
of Bregman balls is the same as for linear separators, and this does not depend on the choice 
of F. 



Theorem 17 The VC dimension of the class of all Bregman balls Bp ofW 1 (for any given 
strictly convex and differentiable function F) is d + 1. 



Proof: We use the lifting map introduced in Section Given a set S of points in M. d , we 
lift them onto JF, obtaining S G 

Let Bp be a Bregman ball and write a for the Bregman sphere bounding Bp. From Lemma [5j 
we know that, for any p G M d , p G B iff p G H^. For a given function F, let Bp denote the 
set of all Bregman balls, and let Tip denote the set of all lower halfspaces of M. d+1 . It follows 
from the observation above that B shatters S iff TC shatters S. Hence the VC dimension of 
B over the sets of points of M. d is equal to the VC dimension of TC over the sets of points of 
T c R d+1 . 

Since the points of S are in convex position, they are shattered by TC iff the affine hull of 
their convex hull is of dimension strictly less than the dimension of the embedding space, i.e. 
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d + 1, which happens iff |<S| < d + 2. Indeed otherwise, the subset of vertices of any facet of 
the upper convex hull of S cannot be obtained by intersecting S with a lower halfspace (an 
upper halfspace would be required). Hence, the VC dimension of Bregman balls is at most 
d+1. 

It is exactly d + 1 since any set of d + 1 points on T in general position generates a d- 
dimensional affine hull A that cannot be shattered by less than d+1 hyperplanes of A. The 
same result plainly holds for hyperplanes of IR d+1 since we can associate to each hyperplane 
h of A a hyperplane H of M. d+1 such that h = H H A. □ 

This result does not fall into the general family of VC bounds for concept classes parameter- 
ized by polynomial-based predicates [23] , it is mostly exact, and it happens not to depend 
on the choice of the Bregman divergence. This has a direct consequence for classification, 
which is all the more important as Bregman balls are not necessarily convex (see Figure [5]). 
Because the capacity of Bregman balls is not affected by the divergence, if we fit this diver- 
gence in order to minimize the empirical risk (risk estimated on S), then there is an efficient 
minimization of the true risk (risk estimated on the full domain X), as well. There is thus 
little impact (if any) on overfitting, one important pitfall for classification, usually caused 
by over-capacitating the classifiers by tuning too many parameters. 

Some applications of our results in supervised learning also meet one of the oldest classifica- 
tion rule: the fc-Nearest Neighbors (fc-NN) rule [22], in which a new observation receives the 
majority class among the set of its k nearest neighbors, using e.g. /c-order Voronoi diagrams 



of S (Section 4.4). Various results establish upperbounds for the fc-NN rule that depend 
on the Bayes risk (the true risk of the best possible rule) [T7j. The choice of the proximity 
notion between observations (it is often not a metric for complex domains) is crucial: if it 
is too simple or oversimplified, it degrades the fc-NN results and may even degrade Bayes 
risk as well; if it is too complicated or complexified, it may degrade the test results via the 
capacity of the rule. Searching for accurate "distance" notions has been an active field of 
research in machine learning in the past decade [12]. Our results on the linearity of the 
Bregman Voronoi diagrams essentially show that we can mix arbitrary Bregman divergences 
for heterogenous data (mixing binary, real, integer values, etc.) without losing anything 
from the capacity standpoint. 



Range spaces of finite VC-dimensions have found numerous applications in Combinatorial 
and Computational Geometry. We refer to Chazelle's book for an introduction to the subject 
and references wherein [15] . In particular, Bronnimann and Goodrich [13] have proposed an 
almost optimal solution to the disk cover algorithm, i.e. to find a minimum number of disks 
in a given family that cover a given set of points. Theorem 17 allows to extend this result 
to arbitrary Bregman ball cover (see also [2T]). 
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7 Conclusion 



We have defined the notion of Bregman Voronoi diagrams and showed how these geometric 
structures are a natural extension of ordinary Voronoi diagrams. Bregman Voronoi diagrams 
share with their Euclidean analogs surprisingly similar combinatorial and geometric proper- 
ties. We hope that our results will make Voronoi diagrams and their relatives applicable in 
new application areas. In particular, Bregman Voronoi diagrams based on various entropic 
divergences are expected to find applications in information retrieval (IR), data mining, 
knowledge discovery in databases, image processing (e.g., see [21] )• The study of Bregman 
Voronoi diagrams raises the question of revisiting computational geometry problems in this 
new light. This may also allow one to tackle uncertainty ('noise') in computational geometry 
for fundamental problems such as surface reconstruction or pattern matching. 

A limitation of Bregman Voronoi diagrams is their combinatorial complexity that depends 
exponentially on the dimension. Since many applications are in high dimensional spaces, 
building efficient data-structures is a major avenue for further research. 
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